An Efficient Clustering-Classification Method in an Information Gain NRGA-KNN Algorithm for Feature Selection of Micro Array Data

نویسنده

  • Akey Sungheetha
چکیده

Gene expressions by microarray data technique have been effectively utilized for classification and diagnostic of cancer nodules. Numerous data mining techniques like clustering are presently applied for identifying cancer using gene expression data. An unsupervised learning technique is a clustering technique used to find out grouping structure in a set of data. The problem of feature selection in clustering algorithm is what type of data attributes used is not known and also for data there is no class labels so there is no clear criteria to direct the search. A further issue in clustering is the identification of the number of clusters that affects the performance of feature selection. Gene expression database have a great potential as a medical diagnostic tool since they represent the state of a cell at the molecular level. Training data sets is available for the classification of cancer types generally have a fairly small sample size compared to the number of genes involved. Feature selection is considered to be a problem of optimization in machine learning, reduces the number of features, noisy and redundant data, and results in acceptable classification accuracy. Hence, selecting significant genes from the microarray data poses a dreadful challenge to researchers due to their high-dimensionality features in clustering technique and the usually small sample size. A clustering algorithm is proposed, which is a hybrid model of information gain genetic algorithm for feature selection in microarray data sets. Information Gain (IG) was used to select important feature subsets (genes) from all features in the gene expression data, and a Non-Dominated Ranked Genetic Algorithm (NRGA) was employed for actual feature selection. The K-NN method is used to evaluate the NRGA algorithm. Experimental results show that the proposed clustering based method simplifies the number of gene expression levels effectively and gives accurate feature selection while compared with other methods. [Akey Sungheetha, Dr. J.Suganthi. An Efficient Clustering-Classification Method in an Information Gain NRGA-KNN Algorithm for Feature Selection of Micro Array Data. Life Sci J 2013; 10(7s): 691-700] (ISSN:1097-8135). http://www.lifesciencesite.com. 108 Keywords---Feature Selection, Gene Expression, Genetic Algorithm, Non-Dominated Ranked Genetic Algorithm, Information Gain, K-nearest neighbor (K-NN)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013